NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Improving Multimodal Large Language Models Using Continual Learning

Srivastava, S; Harun, MY; Singh, R; Kanan, C (August 2025, Proc. Conference on Lifelong Learning Agents (CoLLAs))

Generative large language models (LLMs) exhibit impressive capabilities, which can be further augmented by integrating a pre-trained vision model into the original LLM to create a multimodal LLM (MLLM). However, this integration often significantly decreases performance on natural language understanding and generation tasks, compared to the original LLM. This study investigates this issue using the LLaVA MLLM, treating the integration as a continual learning problem. We evaluate five continual learning methods to mitigate forgetting and identify a technique that enhances visual understanding while minimizing linguistic performance loss. Our approach reduces linguistic performance degradation by up to 15% over the LLaVA recipe, while maintaining high multimodal accuracy. We also demonstrate the robustness of our method through continual learning on a sequence of vision-language tasks, effectively preserving linguistic skills while acquiring new multimodal capabilities.
more » « less
Free, publicly-accessible full text available August 11, 2026
Classifying Unreliable Narrators with Large Language Models

Brei, A; Henry, K; Sharma, A; Srivastava, S; Chaturvedi, S (July 2025, Association for Computational Linguistics)

Free, publicly-accessible full text available July 27, 2026
Designing a Recommender System to Recruit Older Adults for Research Studies

Enam, M.A.; Srivastava, S.; Knijnenburg, B.P. (March 2023, CEUR workshop proceedings)
Smith-Renner, A.; Taele, P. (Ed.)
Recruiting older adults for research studies is a challenging endeavor. We conducted an interview to understand older adults’ preferences and expectations, with the goal of building a recommender system to support the selection of suitable research studies. Our findings suggest that sharing the results of the studies they participated in would motivate older adults to participate in more studies and give them a feeling of self-accomplishment and belonging. We list 15 design implications based on our user research and present a prototype system based on these design implications.
more » « less
Full Text Available
JEDAI: A System for Skill-Aligned Explainable Robot Planning

Shah, N.; Verma, P.; Angle, T.; Srivastava, S. (May 2022, Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems)

This paper presents JEDAI Explains Decision-Making AI (JEDAI), an AI system designed for outreach and educational efforts aimed at non-AI experts. JEDAI features a novel synthesis of research ideas from integrated task and motion planning and explainable AI. JEDAI helps users create high-level, intuitive plans while ensuring that they will be executable by the robot. It also provides users customized explanations about errors and helps improve their understanding of AI planning as well as the limits and capabilities of the underlying robot system.
more » « less
Full Text Available
Distributed Bayesian Varying Coefficient Modeling Using a Gaussian Process Prior

Guhaniyogi, R.; Li, C.; Savitsky, T.D.; Srivastava, S. (May 2022, Journal of machine learning research)
McCulloch, R. (Ed.)
Varying coefficient models (VCMs) are widely used for estimating nonlinear regression functions for functional data. Their Bayesian variants using Gaussian process priors on the functional coefficients, however, have received limited attention in massive data applications, mainly due to the prohibitively slow posterior computations using Markov chain Monte Carlo (MCMC) algorithms. We address this problem using a divide-and-conquer Bayesian approach. We first create a large number of data subsamples with much smaller sizes. Then, we formulate the VCM as a linear mixed-effects model and develop a data augmentation algorithm for obtaining MCMC draws on all the subsets in parallel. Finally, we aggregate the MCMC-based estimates of subset posteriors into a single Aggregated Monte Carlo (AMC) posterior, which is used as a computationally efficient alternative to the true posterior distribution. Theoretically, we derive minimax optimal posterior convergence rates for the AMC posteriors of both the varying coefficients and the mean regression function. We provide quantification on the orders of subset sample sizes and the number of subsets. The empirical results show that the combination schemes that satisfy our theoretical assumptions, including the AMC posterior, have better estimation performance than their main competitors across diverse simulations and in a real data analysis.
more » « less
Full Text Available
Distributed BayesianVarying Coefficient Modeling Using a Gaussian Process Prior

Guhaniyogi, R.; Li, C.; Savitsky, T. D.; Srivastava, S. (May 2022, Journal of machine learning research)

Full Text Available
mechanoChemML: A software library for machine learning in computational materials physics

https://doi.org/10.1016/j.commatsci.2022.111493

Zhang, X.; Teichert, G.H.; Wang, Z.; Duschenes, M.; Srivastava, S.; Livingston, E.; Holber, J.; Shojaei, M. Faghih; Sundararajan, A.; Garikipati, K. (August 2022, Computational Materials Science)

Full Text Available
Distributed Bayesian Inference in Linear Mixed-Effects Models

Srivastava, S; Xu, Y. (March 2021, Journal of computational and graphical statistics)
null (Ed.)
Linear mixed-effects models play a fundamental role in statistical methodology. A variety of Markov chain Monte Carlo (MCMC) algorithms exist for fitting these models, but they are inefficient in massive data settings because every iteration of any such MCMC algorithm passes through the full data. Many divide-and-conquer methods have been proposed to solve this problem, but they lack theoretical guarantees, impose restrictive assumptions, or have complex computational algorithms. Our focus is one such method called the Wasserstein Posterior (WASP), which has become popular due to its optimal theoretical properties under general assumptions. Unfortunately, practical implementation of the WASP either requires solving a complex linear program or is limited to one-dimensional parameters. The former method is inefficient and the latter method fails to capture the joint posterior dependence structure of multivariate parameters. We develop a new algorithm for computing the WASP of multivariate parameters that is easy to implement and is useful for computing the WASP in any model where the posterior distribution of parameter belongs to a location-scatter family of probability measures. The algorithm is introduced for linear mixed-effects models with both implementation details and theoretical properties. Our algorithm outperforms the current state-of-the-art method in inference on the functions of the covariance matrix of the random effects across diverse numerical comparisons.
more » « less
Full Text Available
Identifying Learning Rules From Neural Network Observables

Nayebi, A; Srivastava, S; Ganguli, S; Yamins, D. (December 2020, Advances in neural information processing systems)
null (Ed.)
The brain modifies its synaptic strengths during learning in order to better adapt to its environment. However, the underlying plasticity rules that govern learning are unknown. Many proposals have been suggested, including Hebbian mechanisms, explicit error backpropagation, and a variety of alternatives. It is an open question as to what specific experimental measurements would need to be made to determine whether any given learning rule is operative in a real biological system. In this work, we take a "virtual experimental" approach to this problem. Simulating idealized neuroscience experiments with artificial neural networks, we generate a large-scale dataset of learning trajectories of aggregate statistics measured in a variety of neural network architectures, loss functions, learning rule hyperparameters, and parameter initializations. We then take a discriminative approach, training linear and simple non-linear classifiers to identify learning rules from features based on these observables. We show that different classes of learning rules can be separated solely on the basis of aggregate statistics of the weights, activations, or instantaneous layer-wise activity changes, and that these results generalize to limited access to the trajectory and held-out architectures and learning curricula. We identify the statistics of each observable that are most relevant for rule identification, finding that statistics from network activities across training are more robust to unit undersampling and measurement noise than those obtained from the synaptic strengths. Our results suggest that activation patterns, available from electrophysiological recordings of post-synaptic activities on the order of several hundred units, frequently measured at wider intervals over the course of learning, may provide a good basis on which to identify learning rules.
more » « less
Full Text Available

Search for: All records